An investigation of temporally varying weight regression for noise robust speech recognition
نویسندگان
چکیده
In this paper, recently proposed Temporally Varying Weight Regression (TVWR) is investigated in two ways for noise robust speech recognition. Firstly, since typical model compensation approaches assume that the noise feature is independent and identically distributed, non-stationary noise environment can be poorly compensated using conventional model compensation approaches in the standard Hidden Markov Model (HMM) framework. TVWR, however, maintains both the basic HMM structure and additional time-varying property, therefore, model compensation for TVWR is proposed such that i.i.d. noise assumption can be relaxed. Secondly, although Noise Adaptive Training NAT has been proposed to optimize the “pseudoclean” HMM model for a better performance by maximizing the likelihood of multi-condition data, NAT heavily depends on the simplicity of Vector Taylor Series (VTS) formulation. Hence, other advanced compensation approaches, such as Trajectorybased Parallel Model Combination (TPMC), have difficulties benefiting from this powerful training schema. This paper exploits the time-varying attribute of TVWR to approximate NAT such that any compensation technique can be applied during noise adaptive training. Experiments on the Aurora 4 corpus show that significant improvements over the standard HMM or NAT system can be obtained by compensating TVWR either trained using clean data or adaptively trained using multicondition data.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملJoint adaptation and adaptive training of TVWR for robust automatic speech recognition
Context-dependent Deep Neural Network has obtained consistent and significant improvements over the Gaussian Mixture Model (GMM) based systems for various speech recognition tasks. However, since DNN is discriminatively trained, it is more sensitive to label errors and is not reliable for unsupervised adaptation. Moreover, DNN parameters do not have a clear and meaningful interpretation, theref...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملParameter clustering for temporally varying weight regression for automatic speech recognition
Recently, an implicit trajectory model using temporally varying weight regression (TVWR) was proposed and achieved promising gains using ML training criteria. In the original TVWR, each component weight is modelled as a constrained linear regression function with respect to the monophone posterior feature. Due to the high dimensionality of the posterior feature, many free parameters were introd...
متن کامل